A General Analytical Model of Adaptive Wormhole Routing in k-Ary n-Cube Interconnection Networks
نویسندگان
چکیده
Several analytical models of fully adaptive routing have recently been proposed for k-ary n-cubes and hypercube networks under the uniform traffic pattern. Although, hypercube is a special case of k-ary n-cubes topology, the modeling approach for hypercube is more accurate than kary n-cubes due to its simpler structure. This paper proposes a general analytical model to predict message latency in wormhole-routed k-ary n-cubes with fully adaptive routing that uses a similar modeling approach to hypercube. The analysis focuses Duatos fully adaptive routing algorithm [12], which is widely accepted as the most general algorithm for achieving adaptivity in wormhole-routed networks while allowing for an efficient router implementation. The proposed model is general enough that it can be used for hypercube and other fully adaptive routing algorithms. 1. Introduction It is widely recognised that one of the critical components of a multicomputer is the interconnection network used to connect the processing elements together. Most current multicomputers [6, 15, 22, 24, 25] employ k-ary n-cubes for low-latency and high-bandwidth inter-processor communication. The two most popular instances of k-ary n-cubes are the hypercube (where k=2) and the torus (where n=2). The former has been employed in multicomputers such as the N-Cube [22] and iPSC/2 [25] while the latter has been adopted in machines like the Jmachine [24], CRAY T3E [6] and CRAY T3D [15]. Modern parallel routers significantly reduce average latency by using wormhole switching [7]. Wormhole is a switching strategy that divides each packet in elementary units called flits, each of a few bytes for transmission and flow control, and advances each flit as soon as it arrives at a node. The header flit (containing routing information) governs the route and the remaining data flits follow it in a pipelined fashion. If a channel transmits the header of a message, it must transmit all the remaining flits of the same message before transmitting flits of another message. Once the header is blocked, the data flits are blocked in-situ. Wormhole is attractive because it reduces the latency of message delivery compared to store and forward and requires only a few flit buffers per node. Network throughput of wormhole routed networks can be increased by organizing the flit buffers associated with each physical channel into several virtual channels [9]. These virtual channels are allocated independently to different packets and compete with each other for the physical bandwidth. This decoupling allows active messages to pass blocked messages using network bandwidth that would otherwise be wasted. Most interconnection networks including k-ary ncubes provide multiple physical paths for routing a message between two given nodes. This introduces the problem of choosing a route between many alternatives. Many practical multicomputers [15, 24] have adopted deterministic routing where messages with the same source and destination addresses always take the same network path. This form of routing has been popular because it requires a simple deadlock-avoidance algorithm, resulting in a simple router implementation. However, messages cannot use alternative paths to avoid congested channels, and thus reduce their latency. Fullyadaptive routing has often been suggested to overcome this limitation by enabling messages to explore all available paths. Several authors like Duato [12], Lin et al [20], and Su and Shin [29] have proposed fully-adaptive routing algorithms, which can achieve deadlock-freedom with a minimal requirement for virtual channels, allowing for an efficient router implementation. Analytical models of deterministic routing in common wormhole-routed networks including the k-ary n-cube have been widely reported in the literature [2, 4, 5, ISBN: 1-56555-269-5 547 SPECTS '03 11, 14, 17]. Several researchers have recently proposed analytical models of fully-adaptive routing under the uniform traffic pattern [4, 26, 28]. For instance, Boura et al [4] have proposed a model of fully-adaptive routing in the hypercube. The authors in [26, 28] have described recently models for the high-radix k-ary n-cubes. The most difficult part in developing any analytical model of adaptive routing is the computation of the probability of message blocking at a given router due to the number of combinations that have to be considered when enumerating the number of paths that a message may have used to reach its current position in the network. The problem is further exacerbated when the network dimensionality increases as the number of alternative paths increases. The model in [28] computes the exact expressions for the probability of message blocking at a given router by considering all the possible paths that enable a message to cross from its source to its current position in the network. However, the model is very time consuming due to recursive calculations of message blocking for each node in each iteration of message latency calculation. This paper proposes an alternative analytical model for computing the mean message latency in k-ary n-cubes with fully-adaptive routing. The derivation of the model is similar to the hypercube model presented in [4] and is general that can be used for k-ary n-cubes and hypercubes. As in previous similar studies [4, 26, 28], the present analysis uses Duatos fully adaptive routing algorithm [12]. This form of routing is widely accepted as one of the most general fully-adaptive routing algorithm for wormhole-routed networks, leading to an efficient router implementation. The Cray T3E [6] and the reliable router [10] are two examples of recent practical systems that have adopted Duatos routing algorithm. However, the modelling approach can be easily adopted by other fullyadaptive routing algorithms [e.g. 3, 18, 21]. The rest of the paper is organised as follows. Section 2 reviews some definitions and background that will be useful for the subsequent sections. Section 3 present the analytical model and finally, section 5 concludes this study. 2. Preliminaries The unidirectional k-ary n-cube, where k is referred to as the radix and n as the dimension, has N=k nodes, arranged in n dimensions, with k nodes per dimension. Each node can be identified by an n-digit radix k address (a1, a2 , , an).. The i th digit of the address vector, ai, represents the node position in the i dimension. Node with address (a1, a2 , ,an) is linke to node (b1,b2 , ,bn) if and only if there exists i, ) 1 ( n i ≤ ≤ , such that ai =(bi +1) mod k and aj = bj for ; i j. Thus, each node is connected to a neighbouring node in each dimension. n j ≤ ≤ 1 ≠ Each node consists of a processing element (PE) and switching element (SE) or route. The PE contains a processor and some local memory. The router has ) 1 ( + n input and ) 1 ( + n output channels. A node is connected to its neighboring nodes through n inputs and n output channels in a unidirectional k-ary n-cube. The remaining channels are used by the PE to inject/eject messages to/from the network respectively. Messages generated by the PE are transferred to the router through the injection channel. Messages at the destination are transferred to the local PE through the ejection channel. Each physical channel is associated with some, say V, virtual channels. A virtual channel has its own flit queue, but shares the bandwidth of the physical channel with other virtual channels in a time-multiplexed fashion [7]. The router contains flit buffers for any incoming virtual channel. An (n+1)V-way crossbar switch direct message flits from any input virtual channel to any output virtual channel. Such a switch can simultaneously connect multiple input to multiple output virtual channels while there is no conflicts. Deadlock-free fully-adaptive routing algorithms that require only one extra virtual channel compared to deterministic routing have been discussed in [12, 13, 29] of which Duatos fully-adaptive routing algorithm is most known and widely used in studies and practice as it provide the maximum adaptivity with the minimum number of virtual channels. Duatos algorithm [12] divides the virtual channels into two classes: a and b. At each routing step, a message visits adaptively any available virtual channel from class a. If all the virtual channels belonging to class a are busy, it visits a virtual channel from class b using deterministic routing. The virtual channels of class b define a complete deadlock-free virtual sub-network, which acts like a drain for the virtual sub-networks built from virtual channels belonging to class a. In k-ary n-cubes, Duatos algorithm requires at least three virtual channels per physical channel to ensure deadlock-freedom where the class a contains one virtual channel and class b owns two virtual channels. When there are more than three virtual channels, network performance is maximised when the extra virtual channels are added to class a [12, 13]. Thus, with V virtual channels per physical channel, the best performance is achieved when class a and b contain V-2 and 2 virtual channels respectively. When the network is a hypercube (k=2), however, arrangement of virtual channels will be different. In this case Duatos algorithm requires at least one virtual channel in class b and all the remainder virtual channels to be included in class a virtual channels. ISBN: 1-56555-269-5 548 SPECTS '03 3. Analysis The model uses assumptions that are widely used in the literature [1, 2, 4, 5, 8, 9, 11, 14, 17, 26, 28]. a) Nodes generate traffic independently of each other, and which follows a Poisson process with a mean rate of λ messages per cycle. b) The arrival process at a given channel is approximated by an independent Poisson process. c) Message destinations are uniformly distributed across network nodes. d) Message length is fixed and equal to M flits, each of which is transmitted in one cycle from one router to the next. e) The local queue at the injection channel in the source node has infinite capacity. Moreover, messages are transferred to the local PE as soon as they arrive at their destinations through the ejection channel. f) V virtual channels are used per physical channel. Class a contains virtual channels, that are crossed adaptively. On the other hand, class b contains two virtual channels that are crossed deterministically. Let the virtual channels belonging to class a and b be called the adaptive and deterministic virtual channels respectively. When there is more than one adaptive virtual channel available a message chooses one at random. To simplify the model derivation no distinction is made between the deterministic and adaptive virtual channels when computing virtual channels occupancy probabilities [4, 26, 28]. ) 2 ( − V The model computes the mean message latency as follows. First, the mean network latency, S , that is the time to cross the network is determined. Then, the mean waiting time seen by a message in the source node, s W , is evaluated. Finally, to model the effects of virtual channels multiplexing, the mean message latency is scaled by a factor, V , representing the average degree of virtual channels multiplexing that takes place at a given physical channel. Therefore, the mean message latency can be written as V W S Latency s ) ( + = (1) The average number of hops that a message makes across the network, d , is given by ∑ = = max
منابع مشابه
An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes
The concept af virtual channels is extended to multiple, virtual communication systems that provide adaptability and fault tolerance in addition to being deadlock-free. A channel dependency graph is taken as the definition of what connections are possible and any routing function must use only those connections defined by it. Virtual interconnection networks allowing adaptive, deadlock-free rou...
متن کاملA Comparison of Adaptive Wormhole Routing
Improvement of message latency and network utilization in torus interconnection networks by increasing adaptivity in wormhole routing algorithms is studied. A recently proposed partially adaptive algorithm and four new fully-adaptive routing algorithms are compared with the well-known e-cube algorithm for uniform, hotspot, and local traac patterns. Our simulations indicate that the partially ad...
متن کاملPerformance Analysis of Mesh Interconnection Networks with Deterministic Routing
− This paper develops detailed analytical performance models for k-ary n-cube networks with single-flit or infinite buffers, wormhole routing, and the non-adaptive deadlock-free routing scheme proposed by Dally and Seitz. In contrast to previous performance studies of such networks, the system is modeled as a closed queueing network that (1) includes the effects of blocking and pipelining of me...
متن کاملTotal-Exchange on Wormhole k-ary n-cubes with Adaptive Routing
The total-exchange is one of the most dense communication patterns and is at the heart of numerous applications and programming models in parallel computing. In this paper we present a simple randomized algorithm to efficiently schedule the total-exchange on the family of k-ary n-cubes with adaptive routing and wormhole switching. This algorithm is based on an important property of the wormhole...
متن کاملMultidestination Message Passing in Wormhole K-ary N-cube Networks with Base Routing Conformed Paths 1
This paper proposes a novel concept of multidestination message passing mechanism for wormhole k-ary n-cube networks. Similar to the familiar car-pool concept, this mechanism allows data to be delivered to or picked-up from multiple nodes with a single message-passing step. Such messages can propagate along any valid path in a wormhole network conforming to the underlying base routing scheme (d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003